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Abstract. We generalize univariate multipoint evaluation of polynomi- 
als of degree n at sublinear amortized cost per point. More precisely, it 
is shown how to evaluate a bivariate polynomial p of maximum degree 
less than n, specified by its coefficients, simultaneously at given 
points using a total of 0{n^'^^'') arithmetic operations. In terms of the 
input size A'^ being quadratic in n, this amounts to an amortized cost of 
C)(ArO-334) per point. 



1 Introduction 

By Horner's Rule, any polynomial p of degree less than n can be evaluated at a 
given argument x in 0(n) arithmetic operations which is optimal for a generic 
polynomial as proved by Pan (1966), see for example Theorem 6.5 in Biirgisser, 
Clausen & Shokrollahi (1997). 

In order to evaluate p at several points, we might sequentially compute p{xk) 
for < A: < n. However, regarding that both the input consisting of n coefficients 
of p and n points Xk and the output consisting of the n values p{xk) have 
only linear size, information theory provides no justification for this quadratic 
total running time. In fact, a more sophisticated algorithm permits to compute 
all p{xk) simultaneously using only 0{n ■ log^ n • log log n) operations. Based 
on the Fast Fourier Transform, the mentioned algorithms and others realize 
what is known as Fast Polynomial Arithmetic. For ease of notation, we use the 
'soft-Oh' notation, namely 0~(/(n)) := O (/(n)(log/(n))'='(i)). This variant 
of the usual asymptotic 'big-Oh' notation ignores poly-logarithmic factors like 
log^ n ■ log log n. 



Fact 1. Let R be a commutative ring with one. 

(i) Multiplication of univariate polynomials: Suppose we are given polynomials 
p,q ^ R[X] of degree less than n, specified by their coefficients. Then we can 
compute the coefBcients of the product polynomial p-q £ R[X] using 0~ (n) 
arithmetic operations in R. 
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(a) Multipoint evaluation of a univariate polynomial: Suppose we are given a 
polynomial p E of degree less than n, again specified by its coef- 

ficients, and points xq, . . . ,Xn-i & R- Then we can compute the values 
p(xo), . . . ,p(x„_i) G R using 0""{n) arithmetic operations in R. 

(Hi) Univariate interpolation; Conversely, suppose wc arc given points {xk, Vk) € 
R^ for < k < n such that Xk — xi is invertible in R for all k ^ £. Then we 
can compute the coefRcients of a polynomial p € R[X] of degree less than 
n such that p{xk) = Uj, < k < n, that is, determine the interpolation 
polynomial to data {xk, jjk) using ©^(n) arithmetic operations in R. 

Proof. These results can be found for example in von zur Gathen & Gerhard 
(2003) including smah constants: 

(i) can be done using at most 63.427-n-log2 n-log2 log2 n + 0{n logn) arithmetic 
operations in R by Theorem 8.23. The essential ingredient is the Fast Fourier 
Transform. If i? = C then even |nlog2 n+0{n) arithmetic operations suffice. 
This goes back to Schonhage & Strassen (1971) and Schonhage (1977). 

In the following M (n) denotes the cost of one multiplication of univariate poly- 
nomials over R of degree less then n. 

(ii) can be done using at most ^ M (n) log2 n + 0{n log n) operations in R accord- 
ing to Corollary 10.8. Here, Divide & Conquer provides the final building 
block. This goes back to Fiduccia (1972). 

(iii) can be done using at most ^M(n)log2n -|- 0{n\ogn) operations in R ac- 
cording to Corollary 10.12. This, too, is completed by Divide & Conquer. 
The result goes back to Horowitz (1972). 

You also find an excellent account of all these in Borodin & Munro (1975). □ 

Fast polynomial arithmetic and in particular multipoint evaluation has found 
many applications in algorithmic number theory (see for example Odlyzko & 
Schonhage 1988), computer aided geometric design (see for example Lodha & 
Goldman 1997), and computational physics (see for example Ziegler 2003b). 

Observe that the above claims apply to the univariate case. What about 
multivariate analogues? Let us for a start consider the bivariate case: A bivariate 
polynomial p € R[X,Y] of maximum degree maxdegp := max {deg^ p, degy p} 
less than n has up to coefficients, one for each monomial X'^Y^ with < 
i,j < n. Now corresponding to Fact 1, the following questions emerge: 

Question 2. (i) Multiplication of bivariate polynomials: Can two given bivari- 
ate polynomials of maximum degree less than n be multiplied within time 

(ii) Multipoint evaluation of a bivariate polynomial; Can a given bivariate poly- 
nomial of maximum degree less than n be evaluated simultaneously at 
arguments in time O'^(n^)? 

(iii) Bivariate interpolation; Given points (xk, jjk, Zk) G R"^ , is there a polyno- 
mial p € R[X,Y] of maximum degree less than n such that p{xk,yk) = Zk 
for alio < k <n^? And, if yes, can we compute it in time O'^(n^)? 
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Such issues also arise for instance in connection with fast arithmetic for polyno- 
mials over the skew-field of hypercomplex numbers (Ziegler 2003a, Section 3.1). 

A positive answer to Question 2(i) is achieved by embedding p and q into 
univariate polynomials of degree 0{n^) using the Kronecker substitution Y = 
applying Fact l(i) to them, and then re-substituting the result to a bi- 
variate polynomial; see for example Corollary 8.28 in von zur Gathen & Gerhard 
(2003) or Section 1.8 in Bini & Pan (1994). 

Note that the first part of (iii) has negative answer for instance whenever the 
points {xk,yk) are co-linear or, more generally, lie on a curve of small degree: 
Here, a bivariate polynomial of maximum degree less than n does not even exist 
in general. 

Addressing (ii), observe that Kronecker substitution is not compatible with 
evaluation and thus of no direct use for reducing to the univariate case. The 
methods that yield Fact l(ii) are not applicable either as they rely on fast poly- 
nomial division with remainder which looses many of its nice mathematical and 
computational properties when passing from the univariate to the bivariate case. 

Nevertheless, (ii) does admit a rather immediate positive answer provided the 
arguments {xk,yk), < fc < form a Cartesian n x n-grid (also called tensor 
product grid). Indeed, consider p(X, F) = X]o<j<ri '^^ polynomial in 

Y with coefficients qj being univariate polynomials in X. Then multi-evaluate qj 
at the n distinct values Xk- as qj has degree less than n, this takes time 0~(n) 
for each j, adding to a total of 0"'{n'^). Finally take the n different univariate 
polynomials p(xk, Y) in Y of degree less than n and multi-evaluate each at the 
n distinct values ye: this takes another O'^(n^). 

The presumption on the arguments to form a Cartesian grid allows for a 
slight relaxation in that this grid may be rotated and sheared: Such kind of 




Fig. 1. Cartesian 8 x 8-grid, same rotated and sheared; 64 generic points. 



afRne distortion is easy to detect, reverted to the arguments, and then instead 
appHed to the polynomial p by transforming its coefficients within time C"(n^), 
see Lemma 14 below. The obtained polynomial p can then be evaluated on the 
now strictly Cartesian grid as described above. However, nxn grids, even rotated 
and sheared ones, form only a zero-set within the 2n^-dimensional space of all 
possible configurations of points. Thus this is a severe restriction. 
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2 Goal and Idea 

The big open question and goal of the present work is concerned with fast mul- 
tipoint evaluation of a multivariate polynomial. As a first step in this direction 
we consider the bivariate case. 

The nai'vc approach to this problem, namely of scquentialh^ calculating all 
p{xk,yk), takes quadratic time each, thus inferring total cost of order n^. A first 
improvement to 0~(n^) is based on the simple observation that any n points 
in the plane can easily be extended to an n x n grid on which, by the above 
considerations, multipoint evaluation of p is feasible in time 0^(71^). So we 
may partition the v? arguments into n blocks of n points and multi-evaluate p 
sequentially on each of them to obtain the following 

Theorem 3. Let R he a commutative ring with one. A bivariate polynomial 

p G F] of dcg^(p) < n and dcgy(p) < n, given by its coefBcients, can be 
evaluated simultaneously at given arguments {xk,yk) using at most 0{n^ ■ 
log^ n ■ log log n) arithmetic operations in R. 

We reduce this softly cubic upper complexity bound to £)(n^'^^''). More pre- 
cisely, by combining fast univariate polynomial arithmetic with fast matrix mul- 
tiplication we will prove: 

Result 4. Let K denote an arbitrary field. A bivariate polynomial p € K[X, Y] 
of dcgx (p) < n and dcgy (p) < m, specified by its coefficients, can be evaluated 
simultaneously at N given arguments {xk, Uk) G ^ with pairwise different first 
coordinates using O [{N + nm)m'^^/^~^+^) arithmetic operations in K for any 
fixed £ > 0. 

Here, (jJ2 denotes the exponent of the multiplication of n x n- by rectangular 
n X n^-matrices, see Section 3. In fact this problem is well-known to admit a 
much faster solution than naive O(n^), the current world record u)2 < 3.334 
being due to Huang & Pan (1998). By choosing m = n and N = n^, this yields 
the running time claimed in the abstract. 

The general idea underlying Result 4, illustrated for the case of n = m, 
is to reduce the bivariate to the univariate case by substituting Y in p{X, Y) 
with the interpolation polynomial g{X) of degree less than to data {xk,yk)- 
It then suffices to multi-evaluate the univariate result p{X,g{X)) at the 
arguments Xk . Obviously, this can only work if such an interpolation polynomial 
g is available, that is any two evaluation points {xk,yk) 7^ {xk' , yk') diflPer in their 
first coordinates, Xk 7^ xy ■ However, this condition can be asserted easily later 
on, see Section 6, so for now assume it is fulfilled. 

This naive substitution leads to a polynomial of degree up to 0{n^). On the 
other hand, it obviously suffices to obtain p{X,g{X)) modulo the polynomial 
f{X) := no<fe<n2(^ ^ ^fe) which has degree less than . The key to cfircicnt 
bivariate multipoint evaluation is thus an efficient algorithm for this modular 
bi-to-univariate composition problem, presented in Theorem 9. 
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As we make heavy use of fast matrix multiplication, Section 3 recalls some 
basic facts, observations, and the state of the art in that field of research. Sec- 
tion 4 formally states the main result of the present work together with two 
tools (afEne substitution and modular composition) which might be interesting 
on their own, their proofs being postponed to Section 5. Section 6 describes three 
ways to deal with arguments that do have coinciding first coordinates. Section 7 
gives some final remarks. 

3 Basics on Fast Matrix Multiplication 

Recall that, for a field K, a; = w(K) > 2 denotes the exponent of matrix multipli- 
cation, that is, the least real such that mxm matrix multiplication is feasible in 
asymptotic time 0(77?,'^+^) for any e > 0; see for example Chapter 15 in Biirgisser 
et al. (1997). The current world-record due to Coppersmith & Winograd (1990) 
achieves uj < 2.376 independent of the ground field K. The Notes 12.1 in von zur 
Gathen & Gerhard (2003) contain a short historical account. 

Clearly, a rectangular matrix multiplication of, say, m x m-matrices by m x 
m*-matrices can always be done partitioning into mxm square matrices. Yet, 
in some cases there arc better known algorithms than this. Wc use the nota- 
tion introduced by Huang & Pan (1998): uj{r,s,t) denotes the exponent of the 
multiplication of [m''] x [m*]- by [m*] x [m*] -matrices, that is 

Multiplication of [m'"] x [to"]- by"! 
[to*] X [m*] -matrices can be done > . 
with 0{m'^) arithmetic operations J 

Clearly, w = ui{l, 1, 1). We always have 

max{r + s,r + t,s + t} < ui{r,s,t) < r + s + t. (5) 

Note that w(r, s, t) is in fact invariant under permutation of its arguments. 
We collect some known bounds on fast matrix multiplication algorithms. 

Fact 6. (i) uj = 1, 1) < log2(7) < 2.8073549221 (Stiassen 1969). 
(ii) w = uj{l, 1, 1) < 2.3754769128 (Coppersmith k Wmogi&d 1990). 
(in) L02 := a;(l, 1, 2) < 3.3339532438 (Huang k Pan 1998). 

Partitioning into square matrices only yields lu2 ^ + ^ < 3.3754769128. 
Bounds for further rectangular matrix multiplications can be also be found 
in Huang & Pan (1998). It is conjectured that to = 2. Then by partition- 
ing into square blocks also uj{r,s,t) touches its lower bound in (5), that is 
Lu{r, s, t) = max {r + s,r + t, s + t}. In particular, W2 = 3 then. 

We point out that the definition of u and u){r,s,t) refers to arbitrary alge- 
braic computations which furthermore may be non-uniform, that is, use for each 
matrix size to a different algorithm. However, closer inspection of Section 15.1 
in Biirgisser et al. (1997) reveals the following 



Lo{r, s, t) = mi <T G 
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Observation 7. Rectangular matrix multiplication of\m^] x [m*]- by [m*] x 
\ 171*1 -matrices over K can be done with C'(m'^^'"''''*'"'"^) arithmetic operations in 
K by a uniform, bilinear algorithm for any fixed e. 

A bilinear computation is a very special kind of algorithm where apart from 
additions and scalar multiplications only bilinear multiplications occur; see for 
example Definition 14.7 in Biirgisser et al. (1997) for more details. In particular, 
no divisions are allowed. 



4 Main results 

Our major contribution concerns bivariate multi-evaluation at arguments {xk, jjk) 
under the condition that their first coordinates Xk are pairwise distinct. This 
amounts to a weakened general position presumption as is common for instance 
in Computational Geometry. 

For notational convenience, we define '£)-' (smooth-Oh) which, in addition 
to polylogarithmic factors in n, also ignores factors as long as e > can 
be chosen arbitrarily small. Formally, 0"{f{n)) := n£>o ^(/('^)^^^)- Note that 
0-{f{n)) C 0^{f{n)). 

Theorem 8. Let K denote a field. Suppose n, m € N. Given the nm coefRcients 
of a bivariate polynomial p with dcgx (p) < n and degy (p) < m and given nm 
points {xk,yk) G K^, < fc < nm such that the first coordinates Xk are pairwise 
different, we can calculate then values p{xk,yk) using (nm'^^/^) arithmetic 
operations over K. The algorithm is uniform. 

Observe that this yields the first part of Result 4 by performing \N/ (nm)] 
separate multipoint evaluations at nm points each. Let us also remark that 
any further progress in matrix multiplication immediately carries over to our 
problem. As it is conjectured that ui = 2 holds, this would lead to bivariate 
multipoint evaluation within time 0^(nm^'^). 

Our proof of Theorem 8 is based on the following generalization of Brent & 
Kungs efficient univariate modular composition, see for example Section 12.2 in 
von zur Gathen & Gerhard (2003), to a certain ' bi-to-univariate^ variant: 

Theorem 9. Fix a field K. Given n,m G N, a bivariate polynomial p € K[X, Y] 
with dcg Y(p) < n and dcgy(p) < m and univariate polynomials g, f € K[X] of 
degree less than nm, specified by their coefRcients. Then p[X, g{X)) rem/(X) 
can be computed with 0'^{nm'^^/'^) arithmetic operations in K. 

We remark that true bivariate modular computation requires Grobner basis 
methods which for complexity reasons are beyond our interest here. 
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5 Proofs 

Now we come to the proofs. 

Lemma 10. Let IK denote a field and fix t > 0. 

(i) Let both A be an m x m-matrix and B an m x m* -matrix whose entries 
consist of polynomials aij{X),bij{X) € K.[X] of degree less than n. Given 
m and the n ■ (m^ + m*) coefficients, we can compute the coefficients of the 
polynomial entries Cij{X) of C := A ■ B within 0'^{nm'^^^'^'^^) arithmetic 
operations. 

(a) If A denotes an mxm square matrix with polynomial entries of degree less 
than n and b denotes an m-component vector of polynomials of degree less 
than nm} , then [A, b) ^ A-h is computable within 0^(nrn'^(^'-^'*)). 

(Hi) Let po, . . . ,Pm-i € K[X, Y] denote bivariate polynomials with degxipi) < n 
and dcgyipi) < m, given their nm^ coefRcients, and let furthermore uni- 
variate polynomials g,f& K[X] of degree less than nm* be given by their 
coefficients. Then the coefficients of the m univariate polynomials 

Pi{X,g{X)) rem/(X) 

can be computed with £)'^(nm'^^^'^'*^) arithmetic operations. 

In particular, for t = 1 we have cost 0~{nm'^) C £)'^(nm^'^^^) and for t = 2 we 
have cost 0~{nm'^^) C 0~{nm^-^^^). 

Proof, (i) By scalar extension to i? = K[X] we obtain an algoritiun with cost 
C'~(m'^*^^'^''') arithmetic operations in R using Observation 7. For the algo- 
rithm scalar extension simply means that we perform any multiplication in 
R instead of K, multiplications with constants become scalar multiplications. 
And the cost for one operation in R is C{n) as only polynomials of degree 
n have to be multiplied. 

(ii) For each j, < j < m, decompose the polynomial bj of degree less than 
nm* into m* polynomials of degree less than n, that is, write bj{X) = 
So<fe<m* ^jki^) ■ X'''^. The desired polynomial vector is then given by 

{A-b)^{X)= J2 E b,,{X)-X'^-) 

l<j<m 0<fe<m* 

- E {^■B)JX).X''- 

0<fe<m' 

where < i < m and B := {bjk) denotes an m x m* matrix of polynomials 
of degree less than n. The product A- B can be computed according to (i) in 
the claimed running time. Multiplication by X'^" amounts to mere coefficient 
shifts rather than arithmetic operations. And observing that deg [{A-B)ik) < 
2n, only two consecutive terms in the right hand side of (*) can overlap. So 
evaluating this sum amounts to m*-fold addition of pairs of polynomials of 
degree less than n. Since w(l, 1, > 1 + t by virtue of (5), this last cost of 
nm^+* is also covered by the claimed complexity bound. 
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(iii) Write each pi as a polynomial in Y with coefScients from K[X], that is 

MX,Y)= ^ qij{X).Y^ 

0<j<m 

with all QijiX) of degree less than n. Iteratively compute the m poly- 
nomials gj{X) := g^{X) rem f{X), each of degree less than nm*, within 
time C'~(r?,m^+*) by fast division with remainder (see for example Theo- 
rem 9.6 in von zur Gathen & Gerhard 2003). 

By multiplying the matrix A := (qij) to the vector b := (gj) according to 
(ii), determine the m polynomials 

Pi{X):= qij{X) ■ gj{X), 0<i<m 

0<j<m 

of degree less than n + nm' . For each i reduce again modulo f{X) and 
obtain pi(^X, g{X)^ lem f{X) using another 0^{nm^~^*) operations. Since 
(^(Ijlji) > 1 + t according to (5), both parts are covered by the claimed 
running time 0'^{nm'^^^'^'*^). □ 

Lemma 10 puts us in position to prove Theorem 9. 

Proof (Theorem g). Without loss of generality we assume that m is a square. 
We use a baby step, giant step strategy: Partition p into \/rn polynomials Pi of 
degy(j3i) < \/rn, that is 

p{X,Y) = Pi{X,Y)-Y'^ . 

0<i<^/m 

Then apply Lemma lO(iii) with t = 2 and m replaced by y/m to obtain the ^/m 
polynomials Pi{X) := pi(^X, g{X)) lem f{X) within 0~{nm'^^^^) operations. 

Iteratively determine the ^/rn polynomials gi{X) := (^g{X)^y lem f{X) for 
< i < ^/m within 0~(nm^/^). Again, 0^2 > 3 asserts this to remain in the 
claimed bound. Finally compute 

p{X,g{X))vemf{X) = ^ (pi{X) ■ g^iX)) vem f{X) 

0<i<\/m 

using another time 0'^(nm^/^). □ 

Based on Theorem 9, the following algorithm realizes the idea expressed in 
Section 2. 

Algorithm 11. Generic multipoint evaluation of a bivariate polynomial. 

Input: CocfRcients of a polynomial p G K[X, F] of degxip) < n, dcgy(p) < m 
and points {xk,yk) for < A; < nm with pairwise different first coordi- 
nates Xk- 
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Output: The values p{xk,yk) for < fc < nm. 

1. Compute the univariate polynomial f{X) := ^ ^ ^k) ^ 

0<fc<nm 

2. Compute an interpolation polynomial g € K[X] of degree less than nm 
satisfying g{xk) = yk for all < A; < nm. 

3. Apply Theorem 9 to obtain p{X) := p{X,g{X)) rem f{X). 

4. Multi-evaluate this univariate polynomial p G K[X] of degree less than nm 
at the nm arguments Xk- 

5. Return (p(a;fc))o<fc<nm- 

Proof (Theorem 8). The algorithm is correct by construction. 

Step 1 in Algorithm 11 can be done in 0'^{nm) arithmetic operations. As the 
points {xk,yk) have pairwise different first coordinates, the interpolation problem 
in Step 2 is solvable and, by virtue of Fact l(iii), in running time 0"'{nm). For 
Step 3 Theorem 9 guarantees running time 0'^(nm'^^/^). According to Fact l(ii). 
Step 4 is possible within time O'^(nm). Summing up, we obtain the claimed 
running time. □ 



6 Evaluating at degenerate points 

Here we indicate how certain fields IK permit to remove the condition on the 
evaluation point set imposed in Theorem 8. The idea is to rotate or shear the 
situation slightly, so that afterwards the point set has pairwise different first 
coordinates. To this end choose 6 arbitrary such that 

#{a;fc + %fe|0<fc<7V} = N (12) 

where N := nm denotes the number of points. Then replace each {xk,yk) by 
{x'k,y'k) ■= {xk + dyk,yk) and the polynomial p by y) := p{X — 0Y,Y). This 
can be done with (n^ + m^) arithmetic operations, see the more general 
Lemma 14 below. In any case a perturbation like this might even be a good 
idea if there are points whose first coordinates are 'almost equal' for reasons of 
numerical stability. 

Lemma 13. Let K denote a field and P = {{xk,yk) €K'^\0 <k < N} a col- 
lection of N planar points. 

(i) If #K > N'^, then e K chosen uniformly at random satisfies (12) with 
probability at least 5. Using C(log A^) guesses and a total of 0{N ■ log^ A'^) 
operations, we can thus find an appropriate 9 with high probability. 
If K is even infinite, a single guess almost certainly sufEces. 

(a) In case K = R or K = C, we can deterministically find an appropriate in 
time 0{N ■ log TV). 

(Hi) For a fixed proper extension field L of K, any 9 G'L\K will do. 
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Applying (i) or (ii) together with Lemma 14 affects the running time of Theo- 
rem 8 only by the possible change in the ^-degree. Using (iii) means that all 
subsequent computations must be performed in L. This increases all further costs 
by no more than an additional constant factor depending on the degree [L : K] 
only. 

Proof, (i) Observe that an undesirable 6 with Xk + Oyu = Xk' + Oyk' impHes 
Vk = Vk' or 9 = ^''^f ■ In the latter case, 9 is thus uniquely determined by 

{k,k'}. Since there are at most (^) < such choices {k,k'}, no more 

than half of the #IK > A''^ possible values of 9 can be undesirable. 

(ii) If K = R choose > such that 6*- (j/max — 2/min) < min {xk — Xk' \ xk > Xk'}- 
Such a value 9 can be found in linear time after sorting the points with 
respect to their x-coordinate. 

In case IK = C, we can do the same with respect to the real parts. 

(iii) Simply observe that 1 and 9 are linearly independent. □ 

We now state the already announced 

Lemma 14. Let R be a commutative ring with one. Given n € N and the v? 

coefficients of a polynomial p{X, Y) G R[X, Y] of degree less than n in both 
X and Y . Given furthermore a matrix A G B?^^ and a vector h G E? . From 
this, we can compute the coefficients of the afEnely transformed polynomial 
p{aiiX + ai2Y + bi,a.2iX + 022^ + 62) using 0{n^ ■ log^ n • log log n) or 0^{n'^) 
arithmetic operations over R. 

In the special case R = C we can decrease the running time to 0(v? logn). 

Lemma 14 straight-forwardly generalizes to rf-variate polynomials and rf-dimen- 
sional afRne transformations being applicable within time ©"(n**) for fixed d. 

Proof. We prove this in several steps. 

— First we note that, over any commutative ring S with one, we can compute 
the Taylor shift p{X + a) of a polynomial p € S[X] of degree less than n by 
an element a G S using 0{n ■ log^ n ■ log log n) arithmetic operations in S. 
There are many solutions for computing the Taylor shift of a polynomial. 
We would like to sketch the divide and conquer solution from Fact 2.1(iv) in 
von zur Gathen (1990) that works over any ring S: Precompute all powers 
{X + of ior Q <i <y := \ \0g2 n\ . Then recursively split p{X) = po{X) + 
X'^"pi{X) with degpo < 2" and calculate p{X + a) pa{X + a) + (X + 
of' Pi{X + a). This amounts to 0{n ■ log^ n • log logn) multiplications in 
S and O(nlogn) other operations. So we achieve this over any ring S with 
0{n ■ log^ n • log log n) operations. 

- Next let S = R\Y\. Then we can use the previous to compute p{X + a, Y) 
or p{X + aY, Y) for a polynomial p € R\X, Y] = S[X] of maximum degree 
loss than n and an element a G R. Using Kronecker substitution for the 
multiplications in R[X,Y] this can be done with 0(n^ • log^n • log logn) 
arithmetic operations in R. 
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- Now we prove the assertion. Scaling is easy: p{x,y) ^ p{ax,y) obviously 
works within O(n^) steps. Use this and the discussed shifts once or twice. 

The solution to Problem 2.6 in Bini & Pan (1994) allows to save a factor logn • 
log log n when R = S = C □ 

7 Conclusion and Further Questions 

We lowered the upper complexity bound for multi-evaluating dense bivariate 
polynomials of degree less than n with coefficients at points with pair- 
wise different first coordinates from naive 0{n'^) and C'~(n'^) to 0{n^^^^'^). The 
algorithm is based on fast univariate polynomial arithmetic together with fast 
matrix multiplication and will immediately benefit from any future improvement 
of the latter. 

With the same technique, evaluation of a trivariate polynomial of maximum 
degree less than n at points can be accelerated from nai'vc 0{n^) to 0{n*-^^'^). 

Regarding that the matrix multiplication method of Huang & Pan (1998) has 
huge constants hidden in the big-Oh notation, it might in practice be preferable 
to use either the nai've 2m^ or Strassen's A.lm?-^^ algorithm (with some tricks). 
Applying them to our approach still yields bivariate multipoint evaluation within 
time O(n^) or 0(n^'^^), respectively, with small big-Oh constants and no hidden 
factors logn in the leading term, that is, faster than Theorem 3. 

Further questions to consider are: 

— Is it possible to remove even the divisions? This would give a much more 
stable algorithm and it would also work over many rings. 

— As w > 2, the above techniques will never get below running times of order 
n^'^. Can we achieve an upper complexity bound as close as 0"'(n^) to the 
information theoretic lower bound? 

- Can multipoint evaluation of trivariate polynomials p(Xi,X2,X3) be per- 
formed in time 0(71**)? 

Acknowledgements The authors wish to thank David Eppstein (2004) for an 
inspiring suggestion that finally led to Lemma 13(ii). 
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